Combining Shallow and Deep Processing for NLP

نویسندگان

  • Erhard Hinrichs
  • Kiril Simov
  • Josef van Genabith
  • Pavel Braslavski
  • Ronald M. Kaplan
  • John T. Maxwell
  • Tracy Holloway King
  • Richard S. Crouch
  • Milen Kouylekov
  • Hristo Tanev
  • Ulrich Schaefer
  • Gerold Schneider
  • Alexander Simov
  • Petya Osenova
چکیده

This paper presents a strategy for a syntax based ranking of documents specifically orientedto Question Answering (QA). This strategy should limit the number of documents, processed byan answer extraction module of an syntax oriented QA system. Several measures for statisticalscoring of expressions are presented and evaluated on 400 factoid questions from the TREC-12competition. We prove that syntax based document filtering can outperform classical inversedocument frequency approaches (idf).

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A New Method for Improving Computational Cost of Open Information Extraction Systems Using Log-Linear Model

Information extraction (IE) is a process of automatically providing a structured representation from an unstructured or semi-structured text. It is a long-standing challenge in natural language processing (NLP) which has been intensified by the increased volume of information and heterogeneity, and non-structured form of it. One of the core information extraction tasks is relation extraction wh...

متن کامل

Middleware for Creating and Combining Multi-dimensional NLP Markup

We present the Heart of Gold middleware by demonstrating three XMLbased integration scenarios where multidimensional markup produced online by multilingual natural language processing (NLP) components is combined to deliver rich, robust linguistic markup for use in NLP-based applications like information extraction, question answering and semantic web. The scenarios include (1) robust deep-shal...

متن کامل

Integrating deep and shallow natural language processing components: representations and hybrid architectures

We describe basic concepts and software architectures for the integration of shallow and deep (linguistics-based, semantics-oriented) natural language processing (NLP) components. The main goal of this novel, hybrid integration paradigm is improving robustness of deep processing. After an introduction to constraint-based natural language parsing, we give an overview of typical shallow processin...

متن کامل

An Algorithm Combining Statistics-based and Rules-based for Chunk Identification of Chinese Sentences

Natural language processing (NLP) is a very hot research domain. One important branch of it is sentence analysis, including Chinese sentence analysis. However, currently, no mature deep analysis theories and techniques are available. An alternative way is to perform shallow parsing on sentences which is very popular in the domain. The chunk identification is a fundamental task for shallow parsi...

متن کامل

Combining Shallow and Deep NLP Methods for Recognizing Textual Entailment

We combine two methods to tackle the textual entailment challenge: a shallow method based on word overlap and a deep method using theorem proving techniques. We use a machine learning technique to combine features derived from both methods. We submitted two runs, one using all features, yielding an accuracy of 0.5625, and one using only the shallow feature, with an accuracy of 0.5550. Our metho...

متن کامل

Shallow, Deep and Hybrid Processing with UIMA and Heart of Gold

The Unstructured Information Management Architecture (UIMA) is a generic platform for processing text and other unstructured, human-generated data. For text, it has been proposed and is being used mainly for shallow natural language processing (NLP) tasks such as part-of-speech tagging, chunking, named entity recognition and shallow parsing. However, it is commonly accepted that getting interes...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004